Hidden Variables in Bipartite Networks 



Maksim Kitsak^ and Dmitri Krioukov^ 

^Cooperative Association for Internet Data Analysis (CAIDA), University of California, 
San Diego (UCSD), 9500 Gilman Drive, La Jolla, CA 92093, USA 
(Dated: January 19, 2013) 

We introduce and study random bipartite networks with hidden variables. Nodes in these networks 
are characterized by hidden variables which control the appearance of links between node pairs. 
We derive analytic expressions for the degree distribution, degree correlations, the distribution of 
the number of common neighbors, and the bipartite clustering coefficient in these networks. We 
also establish the relationship between degrees of nodes in original bipartite networks and in their 
unipartite projections. We further demonstrate how hidden variable formalism can be applied 
to analyze topological properties of networks in certain bipartite network models, and verify our 
analytical results in numerical simulations. 

PACS numbers: 89. 75. He, 05.45. Df, 64.60.Ak 



I. INTRODUCTION 

Bipartite networks are composed of two types of nodes 
with no links connecting nodes of the same type, see 
Fig.jlja). Examples include recommendation systems [ij, 
networks of collaborations ^ and metabolic reactions [3] , 
gene regulatory networks [4], peer to peer networks [5], 
pollination networks [6] , and many others [7j . Compared 
to traditional unipartite networks, less is known about 
the organizing principles determining the structure and 
evolution of bipartite networks, partly because only uni- 
partite projections of bipartite networks are often consid- 
ered. The unipartite projection accounts for connecting 
two nodes of one type by a link if these nodes share at 
least one neighbor of the other type, and then throwing 
out all nodes of this other type, see Figs. [TJb) andjljc). 
Even though this procedure allows one to study bipar- 
tite networks using powerful tools developed for unipar- 
tite networks, the unipartite projections in most cases 
lead to significant loss of information, and to artificial 
inflation of the projected network with fully connected 
subgraphs [Tit's]. 

Nodes in real bipartite networks can often be charac- 
terized by a number of intrinsic attributes. For example, 
in recommendation networks, composed of consumer and 
product nodes, a consumer-product pair is connected if 
the consumer has purchased the product. Consumers can 
be characterized by their age, geographic location, in- 
come, sex, lifestyle, etc., while products have their type, 
price, quality, uniqueness, and other properties. Con- 
sumers do not buy products at random. Making their 
purchase decisions, consumers implicitly match their at- 
tributes with those of products. For example, a person 
with a higher income is more likely to purchase an expen- 
sive item, books in Italian are mostly purchased by peo- 
ple who speak Italian, consumers at a gas station tend to 
own a car, etc. Similar considerations apply to the forma- 
tion of links between researchers and scientific projects, 
molecules and reactions in which they participate, and 
so forth. 

The concept of hidden variables formalizes these obser- 




FIG. 1. (Color Online) A toy bipartite network and its unipartite 
projections, (a) Original bipartite network. We refer to the nodes 
of one type as top nodes (labeled by letters) and to the nodes of 
the other type as bottom nodes (labeled by numbers). Unipartite 
projections of the original network onto (b) bottom and (c) top 
domains. The top (bottom) nodes are connected in the projections 
if they have at least one common neighbor in the original network. 



vations as follows. Every node of each type in a bipartite 
network is assigned a number of hidden variables drawn 
from some distributions, and then every node pair of dif- 
ferent types is connected with some probability which 
depends on the hidden variables of the two nodes. In 
this work we build the hidden variable formalism for bi- 
partite networks, based on the formalism developed ear- 
lier for unipartite networks [9 . Specifically, in Section II 
we overview basic topological characteristics of bipartite 
networks. In Section III we define a general class of bi- 
partite networks with hidden variables, and study an- 
alytically the topological properties of networks in this 
class. In Section IV we consider two specific examples 
of bipartite networks with hidden variables, uncorrelated 
and stratified bipartite networks, and confirm in simula- 
tions our analytical results for these networks. Section V 
summarizes the paper. 
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II. TOPOLOGICAL CHARACTERISTICS OF 
BIPARTITE NETWORKS 

In this section we review some key relationships among 
the basic topological characteristics of bipartite networks. 

Let the nodes of two different types be called top and 
bottom nodes, see Figs. [TJb) and[TJc). Similar to uni- 
partite networks, the degree correlations in bipartite net- 
works are defined by the number of links Eki between top 
and bottom nodes of degrees k and £ \W . The correlation 
matrix Eki satisfies the following equations: 

Y,EM = kNk, Y.EM=iMe, Y.Em = E, (1) 

£ k k/ 

where and are the numbers of top and bottom 
nodes of degree k and ^, and E is the total number of 
links in the network. The joint degree distribution P(/c, i) 
is the normalized correlation matrix, i.e., the probability 
that a randomly chosen edge connects nodes of degrees 
k and i: 



and neither do the ANNDs: 



P{k,i 



Em 



(2) 



which contains all information needed to construct a net- 
work with a given degree distribution and correlations. 

The top and bottom node degree distributions P{k) 
and P{t) can be obtained from Eq. ([T]): 



The conditional probabilities P{£\k) and P{k\i) that an 
edge emanating from a k- or ^-degree node is connected 
to a node of degree £ or k are 



p{e\k) 



p{k\e) = 



kP{k,e) 
kNi^ ~ kP{k) 

Eu ^ lP{k,i) 

me ep{e) ' 



(4) 
(5) 



To characterize degree correlations in unipartite net- 
works, one often considers the average nearest neighbor 
degree (ANND), which is the average degree of the neigh- 
bors of all /c-degree nodes [TT]. The ANNDs for top and 
bottom nodes in a bipartite network are 

Inn{k)=Y,eP{e\k), Kn{t) =Y.kP{k\t). (6) 



In uncorrelated bipartite networks 

kP{k) £P{e) 



£ 



(7) 



As a result, P{£\k) and P{k\£) do not depend on k and 
respectively: 



P^^^(£|/c) = =P(£), P^^^(i^l^) = =P(i^), (8) 
i k 



^nn (^) 



(9) 



-^^ Yunc , k'^ 
J' ^nn W = ^• 

Networks with increasing or decreasing ANNDs are called 
assortative or disassortative [12]. Some real bipartite 
networks have no n- trivial degree correlation profiles, and 
therefore they can not be classified as either assortative 
or disassortative [7 . 

The standard clustering coefficient of node i quantifies 
how close i's neighbors are to forming a clique [13]: 
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c(z) 



ki{ki 1) 



^jki 



(10) 



j>k 



where the summation is over all z's pairs of neighbors j 
and /c, and Cjk is the adjacency matrix. Since in bipar- 
tite networks there are no loops of size 3, this clustering 
coefficient is zero for all nodes. Therefore, to assess the 
density of connections in a vicinity of a particular node, 
one has to analyze connections among its second nearest 
neighbors. There have been several attempts to general- 
ize the clustering coefficient for bipartite networks using 
this idea [3 [M] • Here we focus on the definition by 
Zhang et al [H]: 



m>n Qimn 



(11) 



where goes over all pairs of i's neighbors, qimn is 

the number of common neighbors between nodes m and 
n excluding i, km and kn are the degrees of nodes m and 
n, and r]imn = 1 + Qimn + ^mn- The above definition may 
look cumbersome, but it has a simple interpretation. Let 
Ajn and be the sets of neighbors of nodes m and n 
excluding i. Then Qimn is the intersection of Am and 

Qimn H^mn^nH? while Qimn + (^m ~ Vimn) + 

{kn — Vimn) = ||^m U^n|| is their uuiou. Therefore, the 
bipartite clustering coefficient is simply 



CB{i) = 



Xlm>n 11^^ U ^n|| 



(12) 



The ratio of the intersection and union of two sets is 
known as the Jaccard similarity coefficient The bi- 
partite clustering coefficient, on the other hand, is given 
by the ratio of the sums of intersections and unions for 
all pairs of i's neighbors. Therefore, the bipartite cluster- 
ing coefficient can be interpreted as a combined Jaccard 
similarity of i's neighbors. Regardless of the clustering 
definition details, nodes in real bipartite networks tend 
to be strongly clustered, as compared to nodes in their 
randomized counterparts with preserved degree distribu- 
tions [7\. 



III. HIDDEN VARIABLE FORMALISM FOR 
BIPARTITE NETWORKS 

We define the class of bipartite networks with hidden 
variables as follows: 
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(i) Each top and bottom nodes i and j are assigned 
hidden variables tZi and Xj drawn from probabihty 
distribution pt{i^) and pb{X); 

(ii) Each top-bottom node pair {i^j} is connected 
with probabihty r{hZi^Xj)^ < r{hz^X) < 1. 

The hidden variable formalism developed here is valid 
for both discrete and continuous variables. In the lat- 
ter case, all sums must be replaced by integrals. We are 
primarily interested in the cases where the hidden vari- 
able distributions pt{i^) and Pb{X) are independent of the 
sizes of the top and bottom domains N and M. We also 
assume that in the thermodynamic limit of large M, 
these sizes are proportional to each other, N ex M. For 
the sake of clarity we consider only one hidden variable 
per node. The generalization to several hidden variables 
per node is straightforward. We also drop indices in the 
top and bottom hidden variable distribution notations: 
pt{i^) p{i^) and pb{X) p{X). 



A. Degree distributions 

We first compute the most basic topological properties 
of the networks in the model — the degree distributions 
and average degrees. Due to the stochastic nature of 
connections between top and bottom nodes, we can not 
compute the degree of a top node with hidden variable 
Hi deterministically. Instead, we can compute propagator 
g{k\Hi)^ which is the probability that a node with hidden 
variable ends up connecting to k bottom nodes. Simi- 
larly, propagator f{i\X) is the probability that a bottom 
node with hidden variable A will be connected to i top 
nodes. Propagators g{k\f<i) and f{i\X) are the main build- 
ing blocks of the hidden variable formalism. As soon as 
we know g{k\hz),foT example, the average degree k{hz) of a. 
top node with hidden variable hc^ the degree distribution 
P(/c), and the average degree k in the top node domain 
are given by: 

k{^) = Y,kg{k\^). (13) 

k 

P{k) = Y,9ik\K)p{K), (14) 

K 

fc = ^fcP(fc) = ^fc(K)p(«), (15) 

k K, 

while the corresponding expressions for bottom nodes can 
be obtained by an appropriate swap of notations. 

To compute propagator g{k\hz) we first compute par- 
tial propagator g^'{ki\K,) defined as the probability that 
a top node with hidden variable ends up having ki con- 
nections to bottom nodes with hidden variable A^. Since 
links between node pairs appear independently from one 
pair to another, g^'{ki\K,) is given by the binomial distri- 
bution: 



where is the binomial coefficient, and M^. = Mp{Xi) 
is the total number of bottom nodes with hidden variable 
A^. The full propagator g{k\n) is then a convolution of 
partial propagators: 



(17) 



where the product is over the entire spectrum of hidden 
variables A, while the summation is over the ensemble of 
all possible degrees ki whose sum is k. 

Since the full propagator is a convolution, its generat- 
ing function g{z\n) is a product of the generating func- 
tions g^{z\n) for partial propagators: 

9{A^) = \[9^{A^). where (18) 

A 

g{z\K) = Y,9{mz\ (19) 

k 

g\z\K)^Y.9\k\K)zK (20) 

k 

The generating function for binomial g^{k\tz) is 

9\z\K) = {l-z{l-r{K,X)))^\ (21) 
substituting which into Eq. ([l8| we obtain 

\ng{z\n) = M^, pW In [1 - (1 - z)r{f^, A)] . (22) 

A 

The average degree of nodes with hidden variable tz is 
given by the derivative of g{z\f<i) at z = 1 [17 , to confirm 
the obvious 



fc(K) = M^p(A)r(K,A), 



(23) 



while higher moments of g{k\f<i) can be computed by tak- 
ing higher order derivatives of the generating function. 
Eq. ( 23 ) yields the average degree in the entire top node 
domain 



k = Y^ k{K)p{K) = M ^ p{H)p{X)r{i^, A), 



(24) 

k K,X 

and the expected total number of links in the network 
E = Nk = Ml = NM p{tz)p{X)r{tz, A). (25) 

It is evident from the last equation that to end up with 
a sparse bipartite network, E oc N oc the connection 
probability r{hz,X) must be of the form 



r{i^,X) (X r{n,X)/M, 



(26) 



9t'{h\^) = C^;^ H^.KT [l-r(^,A,)] 



ki 



iMx,-ki 



(16) 



where r(/^. A) is independent of M. Therefore, for large 
sparse networks we can expand the logarithm in Eq. (22) 
in powers of r{hi^ A) to finally obtain, in the first order, 

ln5(^|/t)«(z-l)^p(A)f(/t,A), (27) 

A 

g{k\K) = e-~'^^^[k{K)]'' /k\, (28) 
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which we can use to compute the degree distribution in 
Eq. (14). Propagator f{i\X) and degree distribution Pjf) 
for bottom nodes can be obtained from Eqs. ( 28 ) and ( 14 ) 
by swapping k ^ X and k ^ i. 

The Poisson form of the propagator g{k\hz)^ given by 
Eq. (I28|, imphes that 



k^{f^) = [k{f^)] -^k{f^). 



(29) 



Furthermore, Eq. ( |14[ ) ahows us to obtain the second mo- 
ment of the degree distribution: 

p = ^ ep{k) = ^ [k{^)] ' p{k) + J2 mpiii) (30) 



B. Unipartite projection 

Next we estabhsh the connection between the degrees 
of nodes in a bipartite network and in its unipartite pro- 
jections, often considered in the hterature. In the top 
unipartite projection, two top nodes are connected if they 
have at least one common bottom neighbor in the bipar- 
tite network. Therefore, we first compute the probability 
Po{i^i^f^2) that two top nodes with hidden variables a^i 
and K.2 do not have any common bottom neighbors in 
the bipartite network. This probability is 



Po 



(/^i, K2) = Y[[^- ^(^1^ Ai)r(/^2, Xi)] , (31) 



where the product is over all the bottom nodes. Taking 
the logarithm on both sides, we get 

lnpo(^i,^2) = M^p(A)ln[l - r(A^i, A)r(/^2, A)] , 

(32) 

and the probability Pn(^i7 ^2) = 1 — ^0(^1, ^2) that two 
top nodes with hidden variables hzi and tZ2 are connected 
in the unipartite projection is simply 

Pn(^i,/^2) = l-exp{M^p(A)ln [1 - r(A>:i, A)r(/^2, A)]}. 

(33) 

In sparse networks we use Eq. (26) to approximate 

Pn(/^l,^2) as 

Pn(^i, ^2) ^ M ^ p{X)r{f^uX)r{f^2.X)- (34) 

A 

Next we find propagator p{ku\i^)^ the conditional prob- 
ability that a top node with hidden variable hz has ku 
connections in the unipartite projection. The deriva- 
tion is similar to the derivation of propagator g{k\f<i) for 
the bipartite network. We first define partial propagator 

p'^'{ni\hz), the probability that a top node with hidden 
variable is connected in the unipartite projection to rii 
nodes with hidden variable Equation (34) indicates 
that a node with hidden variable Hi is equally likely to 



be connected in the unipartite projection to any of 
nodes with hidden variable where A/"^/ = Np{f<i[) is the 
number of top nodes with hidden variable f<i[. If Ui <C M, 
we can assume that the links in the unipartite projection 

are independent, leading to binomial p'^'{ni\hz): 

Pi'{ni\K.) = Cn,^' [pu{l^, K.'i)]''' [{1-Pu{l^,l^'i))]^<~''' . 

(35) 



Similar to Eq. (17), p{ku\Hi) is then a convolution 



(36) 



and its generating function p{z\tz) = ^i^^p{ku\i^)z^'^ is 
\np{z\^) = nY^ p{k') In [1 - (1 - z)pu{n, k')] . (37) 

Therefore \i Pui^Hi-, Hi') scales as 



Pu{K.,Hi') 



(38) 



with a > 1, then similar to the bipartite case, propagator 
p{ku\Hi) is approximately the Poisson distribution: 



(39) 



The average degree ku{i^) of nodes with hidden vari- 
able n in the unipartite projection is given by the first 
derivative of the generating function p{z\i<i) at 2; = 1 to 
yield the obvious 



ku{l^) = NYp{Hi')pu{Hi,Hi'), 



(40) 



which for sparse networks using Eq. (34) transforms to: 



k^{K) = NMJ2 pWpWHk, \)r{K', A) (41) 
= M^Z(A)p(A)r(K,A), (42) 



where I{X) is the average degree of bottom nodes with 
hidden variable A in the bipartite network. The average 
degree in the entire top unipartite projection is then 

k:=Y^ p{n)k:{n) = ^Y. ^(^) ' ■ (43) 



A 



Finally, the degree distribution in the unipartite projec- 
tion is 



P{K) = Yp{ku\i^)p{K.). 



(44) 
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C. Number of common neighbors 



D. Degree correlations 



The common neighbor statistics is useful in many ap- 
plications, such as node similarity estimation [18] and 
link prediction [19 . We compute the probability that 
two top nodes with hidden variables hzi and hZ2 have m 
common bottom neighbors. This probability can be cal- 
culated as 

^/^iA2M= Yi n^'^i''^2(^ii'^i). (45) 

where P/^i,/^2 I'^i) ^^e probability that two top nodes 
with and n2 have common bottom neighbors with 
Ai, and the product is over the entire range of A^, while 
the summation is over all possible combinations of 
adding up to m. 

Consider two nodes with hidden variables Ki and 
Each common neighbor of the two nodes with ni and i<i2 
appears independently with probability 



rx{ni, K2) = r{ni,\)r{K2, A), 



(46) 



where A is the hidden variable of the common neighbor. 
Therefore, ^^^^^^2(^1^) ^^^^ binomial: 

(47) 

and the corresponding generating function is given by 



P/^i,/^2(^l^) = [!-(!- ^)^a(^i,^2)] 



Ma 



(48) 



Since P^^^^^{m) is given by a convolution, its generation 
function is 

P.,,.,iz)=l[p,,,,,iz\^,). (49) 

i 

Combining the last two equations, we get 

lnP,,,„,(z) = mJ2pW In [1 - (1 - z)fx{Ki,K2)] ■ 

(50) 

To compute the average number of common neighbors 
between top nodes with tzi and tZ2 we evaluate the deriva- 
tive of P^-^^^^{z) with respect to z at z = 1: 



m(/^i,/^2) = M^p(A)fA(/^i,/^2). 



The degree correlations in bipartite networks are fully 
describe d by conditional probabilities P{£\k) and P{k\£) 
in Eqs. (4j5). In order to calculate P{i\k) we need to 
define the related conditional probability p{X\i^) that an 
edge outgoing from a top node with hidden variable 
is connected to a bottom node with hidden variable A. 
Then, P{i\k) can be written as 

Pm = E /(^ - l|A)p(A|K)i/*(K|fc), (53) 

where /(^ — 1|A) is the conditional probability that a bot- 
tom node with hidden variable A ends up having degree 
i (one connection is already taken into account by the 
conditional edge), while the inverse propagator ^*(/c|/^) 
is the probability that a top node of degree k has hidden 
variable hz. This inverse propagator is given by the Bayes' 
formula [20] 

P{k)g%K\k)=p{^)g{k\K), (54) 
using which we write 

^(^1^) = pT^T E P{^)p{M^)f{^ - l|A)^(^l^). (55) 

^^^^ /.,A 

To determine p{X\i^) we note that the conditional prob- 
ability that an edge is connected to a bottom node with 
A, given that this edge is connected to a top node with a^, 
is proportional to the density of bottom nodes p{X) and 
the connection probability r{hz,X)^ 



p{X\hz) (X p{X)r{hz^ A). 



(56) 



Taking into account the normalization condition 
EaP(A|^) = 1' we get 



p{X\^) 



p{X)r{i^,X) 



(57) 



Using Eqs. ([55][57) we obtain the final expression for the 
top ANND statistics: 

Inn{k) = = 1 + ^4n(^)/>(^)^(^|^), 

(58) 

where inn{i^) is the average nearest neighbor degree of 
top nodes with hidden variable tz: 



(51) 



(59) 



The generating function for the common neighbor distri- 
bution has the same structure as g{z\k). Therefore, the 
closed form of P^^^^^{m) in the sparse network approxi- 
mation is given by 

P«„,,(m) ^ e-™('^^''^^) [m{Ki,K2)r /m\. (52) 



E. Bipartite clustering coefficient 

Finally we derive the bipartite clustering coefficient as 
defined by P. Zhang et al |14] . Other variations of the bi- 
partite clustering coefficient can be computed in a similar 
manner. 



6 



The bipartite clustering coefficient of top node i, given 
by Eq. (11), can be written as 



Y.i>i{mji - 1) 



T.j>iif^j ^ - rriji - 1)' 



(60) 



where rriji is the number of common neighbors between 
bottom nodes j and while kj and ki are their degrees. 
Since the summations in the numerator and denomina- 
tor are performed independently, we can estimate the 
average bipartite clustering coefficient of top nodes with 
hidden variable by calculating the ensemble averages of 
the numerator and denominator. The details are in the 
Appendix, while the answer is 



^Ai,A2 p(Ai|A^)p(A2|A>:)m(Ai, A2) 
24n(/^) - Eai,A2 p(Ai|A>:)p(A2|/^)m(Ai, A2) ' 



(61) 

where p{X\i^) is the conditional probability that an edge 
connected to a top node with hidden variable hz is also 
connected to a bottom node with hidden variable A, 
^(Ai,A2) is the average number of common neighbors 
between two bottom nodes with hidden variables Ai and 
A2, and Inn{i^) is the average nearest neighbor degree of 
top nodes with hidden variable hz. The average bipartite 
clustering coefficient of top nodes with degrees k > 2 can 
be expressed in terms of cb{i^) as 



CB{k) 



1 



P{k) 



^p{f^)g{k\f^)cB{f^), 



(62) 



while the average bipartite clustering coefficient in the 
top node domain is simply 



cb = ^p{k)cb{i^) = ^P{k)cB{k). 



(63) 



IV. 



EXAMPLES OF BIPARTITE NETWORKS 
WITH HIDDEN VARIABLES 



Having the general formalism in place, we next con- 
sider a couple of examples of bipartite networks with hid- 
den variables. The ffist example of uncorrelated networks 
is fairly standard. The second one, stratified networks, 
is more unusual. 



A. Uncorrelated Bipartite Networks 

Consider a random bipartite network composed of 
nodes with degrees {ki} and {Ij} drawn from distribu- 
tions P{k) and P(^). If nodes in the network are con- 
nected at random, then two randomly chosen nodes with 
degrees k and £ are connected with probability p = k£/E^ 
where E is the total number of links in the network. 

Similar random uncorrelated networks can be con- 
structed in the hidden variable formalism. Consider a 



network with hidden variables drawn from distributions 
p{k.) and p(A), in which node pairs are connected with 
probability proportional to the product of nodes' hidden 
variables: 



(64) 



where C is some normalization constant. The above form 
of r{hz, A) implies that the hidden variable of a node can 
be regarded as its target or expected degree. Indeed, if 
we choose C = AM, then a top node with hidden variable 
hz gets hz connections on average 



k{K.) = pWr{i^, A) K. 



(65) 



Since the assumption of a sparse network, given by 
Eq. (26) holds here, propagator g{k\f<i) is given by the 



Poisson distribution: 

g{k\f^) = e-''f^^/k\, 



and using Eqs. (14) and (29) one can obtain 



k^ 



tz^ + tz. 



(66) 



(67) 



One type of nodes in real bipartite networks is often 
characterized by scale- free degree distributions, while de- 
gree of nodes of the other type can follow either fat-tailed 
or poissonian distributions p] . Our uncorrelated formal- 
ism can account for both options. The former case is 
actually simpler, and the properties of top and bottom 
nodes can be obtained from each other via a simple swap 
of notations. Therefore below we consider the latter case, 
which is more typical for real networks. 

Specifically, let n be power-law distributed: 



P{k) = (7 - l)kl 



7-1 



(68) 



where power-law exponent 7 and minimum expected de- 
gree t^Q are parameters of the distribution. The resulting 
degree distribution of the top node domain is given by 
Eqs. (fni) and (l28l), which yield 



P(A:) = (7-1)^^- 



,r[fe-7 + l,/>:o] 

r[^ + i] ' 



(69) 



where r[a;,s] is the incomplete gamma function. In the 
large k limit we can approximate P{k) by 



(70) 



We note that the distribution P{k) of top node degrees 
does not depend on a specific form of the hidden variable 
distribution p{\) in the bottom node domain. Let the 
latter be a delta function p{\) = S{X — Aq), meaning that 
all bottom nodes have the same value of their hidden 
variable equal to Aq. Then using the same Eqs. ( ppSl 
swapped for the bottom nodes, we immediately conclude 
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that the distribution of bottom node degrees is poisso- 
nian: 



p(€) 



(71) 



We now turn our attention to the unipartite projec- 
tions. We first consider the top node projection. We use 
Eq. (42) to compute the average degree of /^-nodes: 



(72) 



Therefore the average degree in the top unipartite pro- 
jection is 



(73) 



The degree distribution in the projection are given by 
Eqs. (|44|) and (l39l): 



r[ku + 1] 

that is, this distribution is also a power law, 
P(fc„)~(7-l)Mo]^-^fc-^ 



(74) 



(75) 



and the exponent of this power law is equal to the ex- 
ponent of the top power-law degree distribution in the 
original bipartite network. 

In the bottom unipartite projection, the average node 
degrees are obtained in a similar manner to yield 



4=4(A) = Ao- 



(76) 



For 7 < 3, depends on and diverges in the 
thermodynamic limit. Therefore connection probability 
p(Ai, A2) does not satisfy the condition of Eq. (38), and 
we can not approximate the degree distribution in the 
bottom domain by Eq. (44) with poissonian p{ku\i^) in 
Eq. (39). However, if 7 > 3, then hc'^ is finite in the ther- 



modynamic limit, and the degree distribution is given 

by 



P(C) = e-^"[4]7^!- 



(77) 



As far as correlations are concerned, the conditional 
hidden variable distributions are 



p{X\K) = SiX-Xo), 
p{k\X) = ^p{k), 



(78) 
(79) 



leading to the following expression for the top and bot- 
tom ANNDs given by Eq. (ISSl): 



£nn{k) = 1 + Ao, 
knn{£) — 1 



(80) 
(81) 



The average number of common neighbors is given by 
Eq. (51) yielding, for top and bottom nodes. 



m(Ao, Ao) 



(82) 
(83) 



Finally, to compute clustering, we insert the expres- 
sions for the average number of common neighbors 
83|, ANNDs (|8Q|8lD , and conditional distributions 
79) into Eq. (61), and obtain the average bipartite 



(82 



(78 



clustering coefficient for top and bottom nodes: 



CB{n) 

cb{\) = 



_ _ Aq k'^ 



2Ao - {\lK^)/{Nie) 



(84) 
(85) 



We observe that the clustering coefficient of a node does 
not depend on its hidden variable in either case, i.e., that 
it is constant. This constant decreases as the network 
sizes A/", M increase, and vanishes in the thermodynamic 
limit. 

To test our analytical results we perform simulations, 
setting A^ 2M, TV = 2 _x 10^ 7 2.5, 1, and 

Ao = 6 to satisfy Nk = Mi. The degree distributions in 
the top and bottom domains as well as in their unipartite 
projections are shown in Fig. ([2|. The degree distribution 
of top nodes in the original bipartite network, and in its 
top unipartite projection both follow a power law with 
the same exponent 7 = 2.5, see Fig. [2|a). As seen in 
Fig. [2|b), the degree distribution in the bottom node 
domain is well approximated by a Poisson distribution. 
On the other hand, due to the divergent behavior of the 
second moment of the top degree distribution the 
degree distribution in the bottom unipartite projection 
seems to follow a truncated power-law. The measured 
values of k ,. = 20 .0 and iu ~ 119 are in _good agreement 
with Eqs. ( 73|76 ) since K = 2.85 and Hi'^ = 62 for the 
selected parameters. 

In Fig.jsja) we plot the ANNDs, and confirm that they 
are independent of node degrees as Eqs. ( |80|81D predict 
for uncorrelated networks. 

To test the dependence of the average bipartite cluster- 
ing coefficient on the network size, we generate a num- 
ber of uncorrelated bipartite network of different sizes 
and values of 7. While sampling hidden variables k 
for top nodes, we impose the cutoff of tZrnax ^ A"^/^ 
to avoid structural degree correlations [21 . Therefore, 
A/'(3-^)/2, and the average bipartite clustering co- 
efficient scales as ^ with S = (7 — l)/2 for 
2 < 7 < 3. In Fig. ^h) we confirm this scaling. The fig- 
ure shows the measured bipartite clustering coefficients 
as a function of A" for different values of 7. 
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FIG. 2. (Color Online) Degree distributions in a random un- 
correlated bipartite network, (a) Degree distributions in the top 
domain (green circles) and top unipartite projection (red squares 
The solid lines are the analytical predictions from Eqs. ( |69|74| 
(b) Degree distributions in the bottom domain (green circles) and 
bottom unipartite projection (red squares). Both plots correspond 
to the model with = 2 x 10^, M = 10^, 7 = 2.5, a^q = 1, and 
Ao =6. 



FIG. 3. (Color Online) (a) The average nearest neighbor degrees 
of top (blue triangles) and bottom (magenta hexagons) nodes in 
an uncorrelated bipartite network with N = 2 x 10^, M = 10^, 
7 = 2.5, ^co = 1, an d Ap = 6.0. The solid lines are the analytical 
predictions in Eqs. ( |80|81| >. (b) The average bipartite clustering 
coefficient for top nodes in uncorrelated bipartite networks as a 
function of network size for 7 = 2.1, 7 = 2.5 and 7 = 3.0. 
The solid lines are the theoretical predictions of ~ with 
5 = (7-l)/2. 



B. Stratified Bipartite Networks 

The original stratified unipartite network model was 
considered by Leicht et al [18]. In this model, N nodes 
are assigned random integer ages ti = 1, . . . ,tmax with 
uniform probability, and then links are created between 
node pairs with probability 

P(At) =poe"'''^'', (86) 

where po and a are model parameters. The motivation for 
this model in |18 was to have a simplified social model in 
which individuals preferably connect to other individuals 
of similar age. The stratified model was used in [18 to 
test the ability of different node similarity measures to 
infer relative node ages. 

Here we generalize the stratified network model as fol- 
lows. The networks in the model consist of N top and M 
bottom nodes. All nodes are assigned hidden variables 



K and A drawn from the continuous uniform distribution 
on interval [0,T], p{hz) = p{X) = 1/T. To eliminate finite 
size effects we impose the periodic boundary condition, 
meaning that nodes are uniformly scattered along a cir- 
cle, and their hidden variables are their angular coordi- 
nates if we set T = 27r. To simplify the calculations we 
use the squared distances in the connection probability 
function: 

r(A>:,A) = roe-^ll^-'^ll', (87) 

where ||A — is the angular distance between A and hc: 

||A-/^|| =7r- Itt- |A-/^||. (88) 

We first calculate the degree distributions for the top 
nodes. Due to the uniform distribution of hidden vari- 
ables, the expected degree of a node is independent of its 
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hidden variable hz. Using Eqs. ( |23[ ) and ( |24[ ) we obtain 

Mro 



k — k{K) 



Erf(^V^), 



(89) 



where Erf(x) is the error function. For k to be indepen- 
dent of network size, we must set VQ/^/a ~ 1/M. An- 
other natural choice would be to constraint ro = M~^, 
but this choice would lead to bipartite clustering coeffi- 
cients dependent on the network size. Constant bipartite 
clustering can be instrumented by setting 



^0 



1, and a = aM 



(90) 



where a is a parameter controlling the average degree 
in the network. With the above choice of parameters 
Eq. (89) simplifies to 



k = k{i<i) 



1 



(91) 



Similarly, the average degree in the bottom node domain 
is given by 



£ = Z(A) = ^fc. 



(92) 



Since connection probability r{n^ A) does not scale as 
M~^, propagator g{k\n) is not given by Eq. (28). Instead 



we have to use Eq. ( 22 ) to compute the propagator, yield- 
ing 



g{z\K) 



-/cLi3/2(l-2;) 



(93) 



where Li„(a;) is the polylogarithm. Equation (93) can be 



used to calculate higher moments of the degree distribu- 
tion. For example, the second moment is 



fc2 



1 

71 



)• 



(94) 



That is, similar to the Poisson distribution, the standard 
deviation oi g{k\K) is 



a = 



oc 



Vk. 



(95) 



According to Eq. (59), the average nearest neighbor 



degree is independent of the node's hidden variable: 

Inn{^)=l (96) 

because node degrees are not correlated with their hidden 
variables, see Eq. (91). Therefore, despite strong correla- 
tion between hidden variables of connected nodes, there 
are no degree correlations. The ANND can be obtained 
by inserting Znn(^) from Eq. (96) into Eq. (58) to yield 



(97) 



The average number of common neighbors between 
bottom nodes with hidden variables Ai and A2 is given 
by Eq. (51), which now becomes 



m(Ai, A2) 



27T 
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FIG. 4. (Color Online) Stratified bipartite networks, (a) De- 
gree distributions of top (green circles) and bottom (red rectangles) 
nodes, (b) Average nearest neighbor degrees for top and bottom 
nodes as a function of node degree, (c) Average bipartite clustering 
coefficients of top and bottom nodes as a function of node degree. 
All the plots are for stratified bipartite networks with = 10^, 
M = 2x 10^, and ^= 20. 



To compute m(Ai,A2) we first change the integration 
variable to x = y/an^ so that the new integration lim- 
its are ±y^7r. Since ^J~a ~ M, in the thermodynamic 
limit the integration interval becomes (—00,00), leading 
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to 



m{Xi,X2) = 



Npl 



-a||Ai-A2f/2 



(99) 



Inserting the expression for m(Ai,A2) and inn{i^) into 
Eqs. (61) and (62) yields the average bipartite clustering 



coefficient: 



CB{k) = cb{i^) 



(100) 



To validate the obtained analytical expressions we per- 
form numerical simulations, generating networks with 
N = 10^ and M = 2 x 10^. To generate a network with 
a target value of k we set a according to Eq. (91). Fig- 
ure [4]^a) shows the degree distributions for the top and 
bottom nodes in the model. The degree distributions are 
well approximated by the Poisson distributions with the 
averages at = 20 and I = kN/M. Figure ji^b) confirms 
that there are no correlations: Inn{k) and knn{^) do not 
depend on node degree, and match Eq. ( [96| ). Figure [4]^c) 
shows that clustering is strong, does not depend on either 
node degree or sizes M, and matches the prediction 
in Eq. (100). The appearance of high bipartite clustering 



in the stratified model is due to preferential linking of 
nodes with similar hidden variables. 



bipartite networks, nodes of both type reside in hidden 
variable spaces, and the connection probability between a 
pair of nodes is a function of their hidden variables. The 
independent character of link appearance in the model 
allows one to calculate analytical expressions for many 
important topological properties of modeled networks. 

The formalism developed here builds up on the hid- 
den variable formalism for unipartite networks |9] . Some 
basic structural properties of bipartite networks, such as 
the degree distributions and correlations, are straight- 
forward generalizations of those in unipartite networks. 
Some other characteristics, such as unipartite projections 
and bipartite clustering, are unique to bipartite networks. 

The hidden variable formalism has proven to be a pow- 
erful tool in studying the structure and function of com- 
plex networks [22H25] . One particular application of in- 
terest for us are network geometry and navigability [26]- 
29 . The formalism developed here can also be useful in 
inferring individual characteristics, attributes, and anno- 
tations of nodes in real bipartite networks. 
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Appendix A: Derivation of the bipartite clustering coefficient 



Here we provide the detailed derivation of the bipartite clustering coefficient defined in Eq. (60). We estimate the 
average bipartite clustering coefficient of a node with hidden variable hi by calculating the ensemble averages of the 
numerator and the denominator in Eq. ( 60 ) : 



We first focus on the numerator in Eq. (Al): 



{J2im,i - 1)) = J - 1) E p(Ai|k)p(A2|k) E(m - l)PA„A.(m - 1), 



(Al) 



(A2) 



j>i 



Al ,A2 



where g{k\Hi) is the n-io-k propagator, p{\i\n) is the conditional probability that a bottom node has hidden variable 
Al provided it is connected to a top node with and Px-^^x^{m — 1) i s th e probability that two bottom nodes with 
Al and A2 have exactly m — 1 common neighbors besides i. Equation \A2\ simplifies to 

(A3) 



(E(m,,-l)) = i(fc(fc-l))« ^^(Ai|«)P(A2|«)m(Ai,A2). 



Ai,A2 



Next we compute the denominator of Eq. (Al): 



(^(fc, +ki- m,i - 1)) = (^(fc, -l + ki-1))- (^(m,, - 1)). 

j>i j>i j>i 



(A4) 



Sum {^j^iiTUji — 1)) is the same as in the numerator, so that we only need to compute {^jyi{kj — 1 -\- ki — 1)): 



Y,{kj -l^ki-l) = {k,-l) Y,{kj - 1) = {k^ - mikj - 1), 
j>i j=i 



(A5) 



where ki is degree of node i. Therefore, 



{J2{kj -l + k-l)) = J2g{k\K){k - l)kJ2p{M'i) - - = - (A6) 

j>l k X i 



Using Eqs. (A3) and (A6) we finally obtain 



Eai,A2 />(Ai|/^)/)(A2|/^)m(Ai, A2) 



24n(/^) - Eai,A2 p(Ai|/^)p(A2|/^)m(Ai, A2) 



(A7) 



